Sesla Transcriber: a Speech Transcription Tool That Adapts to Your Skill and Time Budget

نویسندگان

  • Matthias Sperber
  • Graham Neubig
  • Satoshi Nakamura
  • Alex Waibel
چکیده

We present a speech transcription tool targeted at situations in which cost is a critical or limiting factor. This tool actively guides the transcription process by taking an automatically created transcript as a starting point, and asking for correction of only the parts likely to contain errors. The transcriber specifies a time budget, and the software uses models of transcription accuracy and cost to choose which segments should be transcribed to achieve the highest error reduction. This approach has been found to be 25% more efficient than costinsensitive approaches in previous work. The cost model is adapted to the transcriber on-the-fly during the transcription process, so no user enrollment is necessary. The segmentation is updated regularly to reflect improved cost models, and to recover from potential time prediction errors. The user interface was designed to be easy to learn and efficient to use. It allows either transcribing each segment from scratch or postediting, and has logging features that allow detailed user studies. 1. COST-SENSITIVE TRANSCRIPTION This paper describes a new tool for efficient manual correction of speech transcripts. Our tool uses an automatically created transcript as a starting point, and guides the transcriber through the correction of a selection of segments that are likely to contain errors. The general strategy is to focus only on those erroneous parts, and trust the speech recognizer for other parts, in order to cut transcription costs. Our tool asks the user to specify a time-budget (for example, 30 minutes of annotation), and automatically chooses an appropriate number of segments for correction such that the time-budget is kept. The locations and sizes of these segments are chosen such that the expected reduction of errors is maximized given the time budget, according to the SESLA method (Segmentation for Efficient Supervised Language Annotation) as described in [1]. Specifically, the tool predicts both transcription time and error reduction for transcribing any possible segment in a speech. Using these predictions, an optimal segmentation into segments to transcribe and segments to skip is computed. There is a trade-off to consider when choosing between smaller and longer segments: While choosing very small segments (e.g. single words) would allow the transcriber to really concentrate only on parts that have a very high probability of error, longer segments are desirable from a cognitive point of view as they reduce cognitive overhead due to context switches. A global constrained optimization strategy that is based on the time and error reduction predictions and considers all possible segmentations allows finding an optimal trade-off in a principled way. Savings in human effort of 25% were observed, compared to the traditional approach of choosing low-confident segments from a fixed segmentation. To obtain the transcription time and error reduction predictions needed to find the optimal segmentation, the tool proceeds as follows. Confidence scores provided by the automatic speech recognizer are employed to estimate chance of error. Transcription time is predicted via Gaussian Process regression [2], with the features segment length, audio duration, and average word confidence. The model is continually retrained on the observed transcription times during the ongoing transcription process. The regressor is initiated with a sensible prior so that rough predictions are possible even for new users, no initial user enrollment is required. Based on the prior model, the regressor is then retrained regularly to reflect the transcriber’s characteristics with gradually increasing accuracy. To take advantage of the improving user models, the segmentation of the remainder of the speech being transcribed is updated regularly as well. Each segmentation update reflects the updated currently remaining time budget, and in this way, for instance, allows removing (skipping) less promising segments if the transcriber’s progress has been slower than expected. Hence, we can rapidly recover from prediction inaccuracies that are expected to occur during practical use, and make sure that the remaining time is used optimally. These updating strategies have been proposed in [3], along with a fast segmentation algorithm which we employ in our tool.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transcribing against time

We investigate the problem of manually correcting errors from an automatic speech transcript in a cost-sensitive fashion. This is done by specifying a fixed time budget, and then automatically choosing location and size of segments for correction such that the number of corrected errors is maximized. The core components, as suggested by previous research [1], are a utility model that estimates ...

متن کامل

Consistency in transcription and labelling of German intonation with GToBI

A diverse set of speech data was labelled in three sites by 13 transcribers with differing levels of expertise, using GToBI, a consensus transcription system for German intonation. Overall inter-transcriber-consistency suggests that, with training, labellers can acquire sufficient skill with GToBI for large-scale database labelling.

متن کامل

Transcriber: Development and use of a tool for assisting speech corpora production

We present ``Transcriber'', a tool for assisting in the creation of speech corpora, and describe some aspects of its development and use. Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions. It is highly portable, relying on the scripting language Tcl/Tk with exten...

متن کامل

An Environment for Testing Prosodic and Phonetic Transcriptions

An interactive speech transcription tool is described. Segmental and tonal transcription may be performed, and the transcriber may get instant feedback on the accuracy and adequacy of the transcription by synthesizing a speech waveform on the fly with the segmental and tonal transcriptions as input. This speech sound may then be examined auditorily. Transcription labels may be moved by simple d...

متن کامل

Transcribing with Annotation Graphs

Transcriber is a tool for manual annotation of large speech files. It was originally designed for the broadcast news transcription task. The annotation file format was derived from previous formats used for this task, and many related features were hard-coded. In this paper we present a generalization of the tool based on the annotation graph formalism, and on a more modular design. This will a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014